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Abstract 

The TCP window size process appears in the modeling of the famous Transmission 
Control Protocol used for data transmission over the Internet. This continuous time 
Markov process takes its values in [0, oo), is ergodic and irreversible. It belongs to the 
Additive Increase Multiplicative Decrease class of processes. The sample paths are 
piecewise linear deterministic and the whole randomness of the dynamics comes from 
the jump mechanism. Several aspects of this process have already been investigated 
in the literature. In the present paper, we mainly get quantitative estimates for the 
convergence to equilibrium, in terms of the W\ Wasserstein coupling distance, for the 
process and also for its embedded chain. 
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1 Introduction 

The TCP protocol is one of the main data transmission protocols of the Internet. It 
has been designed to adapt to the various traffic conditions of the actual network. For 
a connection, the maximum number of packets that can be sent at each round is given 
by a variable W, called the congestion window size. If all the W packets are successfully 
transmitted, then W is increased by 1, otherwise it is multiplied by 5 £ [0, 1) (detection of 
a congestion) . As shown in [THl US] , a correct scaling of this process leads to a continuous 
time Markov process, called general TCP window size process. This process X = (Xt) t>0 
has [0, oo) as state space and its infinitesimal generator is given, for any smooth function 
/: [0,oo)-R, by 

L(f)(x) = f'(x) + x f\f(hx) - f(x))H(dh) (1) 
J o 

for some probability measure H supported in [0, 1). This window size [Xt) t increases 
linearly (this is the /' part of L) until the reception of a congestion signal which forces 
the reduction of the window size by a multiplicative factor of law H or equal to 5 in 
the simplest case (this is the jump part of L). The sample paths of X are deterministic 
between jumps, the jumps are multiplicative, and the whole randomness of the dynamics 
relies on the jump mechanism. Of course, the randomness of X may also come from a 
random initial value. The process (Xt) t>0 appears as an Additive Increase Multiplicative 
Decrease process (AIMD), but also as a very special Piecewise Deterministic Markov 
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Process (PDMP) initially introduced in [JJ. In this direction, [TJJ gives a generalization 
of the scaling procedure to interpret various PDMPs as the limit of discrete time Markov 
chains. In the real world (Internet), the AIMD mechanism allows a good compromise 
between the minimization of network congestion time and the maximization of mean 
throughput. 

Our aim in this paper is to get quantitative estimates for the convergence to equilibrium 
of this general TCP window size process. This process X is ergodic and admits a unique 
invariant law, as can be checked using a suitable Lyapunov function (for instance V(x) = 

1 + x, see e.g. [2JEJQI)] for the Meyn-Tweedie-Foster-Lyapunov technique). Nevertheless, 
this process is irreversible since time reversed sample paths are not sample paths and it 
has infinite support. This makes Meyn-Tweedie-Foster-Lyapunov techniques inefficient for 
the derivation of quantitative exponential ergodicity. 

The embedded chain X of the process X is the sequence of its positions just after 
a jump. It is an homogeneous discrete time Markov chain with state space [0, oo). As 
already observed in [5], it is also the square root of a first order auto-regressive process 
with non-Gaussian innovations and random coefficients. We obtain the following results 
concerning X. We show first that it admits a unique invariant probability measure z/, and 
that it converges in law to v given any (random) initial value Xq. More precisely, using a 
coupling technique on trajectories, we prove an ergodic theorem of geometric convergence 
to equilibrium with respect to any Wasserstein distance. Then we provide non asymptotic 
concentration bounds, thanks to Gross's logarithmic Sobolev inequalities. 

Similarly, the continuous time process X admits a unique invariant probability measure 
(j,, and converges in law to Li, for any (random) initial value Xq. The reader may find 
explicit series for the moments of fx and v in |1CH [T31 I15j . Nevertheless, quantitative 
convergence to equlibium have not yet been obtained. We will adress this question for a 
slight generalization of the TCP process given by its infinitesimal genrerator: 

L a (f)(x) = fix) + (x + a) f\f(hx) - f(x))H(dh) (2) 

J 

where a 0. We obtain a good answer if a > 0. In this case we first show the existence 
of a coupling with exponential decay. We use this result to prove an exponential ergodic 
theorem in term of Wasserstein distance. Eventually, we provide a uniform bound over the 
starting law that implies strong ergodicity. This kind of uniform estimates, though classical 
for processes on a compact set, is rather unusual for real valued processes. Nevertheless, 
if a = 0, we are not able to derive exponential bounds. 

The remainder of the paper is organized as follows. In the next preliminary section, 
we introduce some notations and give the statements of the main results. In section [3j 
we focus on the embedded chain X and establish its convergence to equilibrium. The last 
section is devoted to the study of the continuous time process X and its generalization 
and contains the proof of the results announced in section [2j 

2 Notations and main results 

Let us first explain how the trajectories of the process X may be constructed. The jump 
rate (or jump intensity) of X is given by X(x) = x for every x G [0, oo). If Xq = x then 
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the process will experience its first jump at a random time T solution of 



f X(X s )ds = E, 
Jo 



where E is an exponential random variable of unit mean. Since the trajectories of X are 
piecewise deterministic with slope 1, this is nothing else but 



/ X(x + s)ds = E, 
Jo 



which leads to T = \J x 1 + IE — x. Then, the sample paths of the process X generated by 
((TJ may be constructed recursively as follows. Let Xq be its non-negative random initial 
position, (E n ) n>1 be a sequence of i.i.d. exponential random variables of unit mean, and 
(Qn) n >i De a sequence of i.i.d. random variables of law H. Assume that Xq, (E n ) n>1 
and (Q n ) n >i are independent. We define by induction the jump times (T n ) n>1 and the 
positions just after the jumps (XT„) n>1 as 



T n = T n - 1 + ^X* n _ i +2E n -X Tn _ 1 and X Tn = Q n ^X* ni + 2E n . (3) 

If we set To = 0, then for every n ^ and t S [T n , T n+ i), we have X t = Xt„ +t — T n and 
in particular, Xj- n = Q n X T -. For every t ^ 0, one can also write the series representation 

oo 

X t = J2(XT n +t-T n )l [Tn!Tn+l) (t). 

n=0 

The sequence X = (XT n ) n>0 is the embedded chain of X. According to ([3]), this discrete 
time Markov chain with state space [0, oo) satisfies the recursion 

X n+1 = Qn+l( X n + ^ E n+l)- (4) 

Thus, the embedded chain X is the square root of a first order auto-regressive process 
with non-Gaussian innovations (2Q^E n ) n ^i and random coefficients (Q^) n>1 , as already 
observed in [5]. The embedded chain X is homogeneous, and its transition kernel K is 
given, for any x ^ and every bounded measurable / : [0, oo) — > R, by the formula 

K(f)(x) = J°° f(y) K(x,dy) = E[f(QVx 2 + 2E)] (5) 

where E is an exponential random variable of unit mean and Q is a random variable of law 
H independent of E. We show in section [3] that the embedded Markov chain X admits 
a unique invariant probability measure v, and converges in law to v given any (random) 
initial value Xq. Similarly, the continuous time process X admits a unique invariant 
probability measure fj,, and converges in law to fi, for any (random) initial value Xq. We 
recall that explicit series for the moments of \x and v can be found in |10l I14|. [T5] . 

Despite the apparent simplicity of the dynamics ([1]) , the quantitative study of the long 
time behavior of X is not easy, mainly because the jump rate depends on the position x of 
the process. Our strategy is to couple two trajectories starting at two different points in 
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such a way that they get closer and closer. It seems difficult to stick the two trajectories 
in order to get total variation estimates since the sample paths are parallel between jump 
times. Thus, we provide quantitative bounds in terms of the Wasserstein coupling distance. 
Recall that for every p ^ 1, the W p Wasserstein distance between two laws \i and v on R 
with finite p th moment is defined by 

W p (ji, v) = ( inf f \x-y\P U(dx, dy)) ? (6) 

where the infimum runs over all coupling of fi and v. In other words, II runs over the 
convex set of laws on R 2 with marginals \i and u, see e.g. [19} 120] . It is well known that for 
any p ^ 1, the convergence in W p Wasserstein distance is equivalent to weak convergence 
together with convergence of all moments up to order p. 

The jump part of L ensures that the process will remain essentially in a compact set. 
The jumps act in a way like a confining potential. On the other hand, the jump rate is 
small when the process is close to the origin. This prevents the decay of the Wasserstein 
distance to be exponential for small times. 

In section [3] we first establish the following geometric convergence to equilibrium of 
the embedded Markov chain X for any Wasserstein distance. 

Theorem 2.1 (Wasserstein exponential ergodicity for the generic embedded chain). Let 
X = (Xt)t^Q and Y = (Y t )t^o be two processes generated by ((1|). Assume that JO(Xq) and 
C(Yq) have finite p th moment for some real p ^ 1. Let X and Y be the embedded chains 
of X and Y . Then, for any n 0, with a random variable Q ~ H, 

W p (C(X n ),C(Y n )) < E(QP) n/p W p (C(X ),C(Y )). 

In particular, if v is the invariant law of X then 

W p {C{X n ),v) ^ E(QV) n /vw p (£(X ),v). 

We also establish in section [3] non asymptotic concentration bounds in the ergodic 
theorem by using Gross logarithmic Sobolev inequalities: 

Theorem 2.2 (Gaussian deviations for the ergodic theorem for the embedded chain). Let 
X be the embedded chain associated to JT]) and starting from Xq = x ^ 0. Assume that 
H is the Dirac mass at point 5 E (0, 1). Then for any u ^ and any 1-Lipschitz function 
/:[0,oo)^R 3 



\ k=l J 



dv 



. , n{l-5 2 )u 2 

>u + - rWi(<5 x ,i/) <2exp' 
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The convergence to equilibrium of the continuous time process X with generator §2$ is 
addressed in section 21 The idea is to exhibit a coupling, i.e. a Markov process on [0, oo) 2 
for which the marginal components are generated by with prescribed initial laws. The 
infinitesimal generator L of this coupling is defined for every smooth / : [0, oo) 2 -> R by 

L(f)(x, y) = div(/)(x, y) + (x + a) f (f(hx, hy)^^ + f(hx, y)^- - f(x, y)) H(dh) 

Jo V x + a x + a J 

(7) 
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if x ^ y and 

L(f)(x, y) = div(/)(x, y) + (y + a) / (f(hx, hy)^ + /(x, fa/)^ - f(x, y)) H(dh) 

Jo V y + a y + a / 

if x ^ y, where div(/) = d\f + d<if ■ This coupling is the only one such that the lower 
component never jump alone. Let us give the pathwise interpretation of this coupling. 
All the heuristic statements below are made more precise hereafter. The positions of both 
"components" increase linearly with slope 1. The jump rate is an increasing function of 
the position. Thus, "the higher a component is, the sooner it will jump". The dynamics 
of the couple of components is as follows: 

1. After an "appropriate" time which depends only on the initial position of the upper 
component, this one jumps. 

2. Simultaneously, the other one "tosses an appropriate coin" whose probability of 
success depends on the positions on the two components to decide whether or not it 
jumps too. 

3. In the case of joint jumps, both components use the same multiplicative factor. 

4. Then, we repeat these three first steps again and again. . . 

This coupling provides the following quantitative exponential upper bounds. 

Theorem 2.3 (Wasserstein exponential ergodicity). Assume that a > 0. For any pro- 
cesses (Xt)t^o an d (Yt)t^o generated by ([2|) with finite first moment at initial time, and 
for any t > 0, we have 

Wi(£(X t ), < e- aKl *Ty 1 (£(A ),£(y )), 

where k\ = 1 — j Q l hH(dh). In particular, when Yq follows the invariant law \i of ([2]) ; we 
get for every t ^ 

W 1 (£(X t ),^^e- a ^ t W 1 (C(X ), f i). 

The following theorem, proved in section U shows that the convergence to equilibrium 
is in fact uniform over the starting laws, as it could be for a process living in a compact 
set. 

Theorem 2.4 (Strong ergodicity). Assume that a > 0. For two processes X = (Xt)t^o 
and Y = (Y t )t^o generated by ([2]) with arbitrary initial laws C(Xq) and C(Yq) and for 
every t and s such that t > s > 0, one has 

r) p a,KlS 

Theorem 12.41 provides in particular a uniform bound in N £ (0, oo) if Xq = and 
Yq = N. This kind of uniform estimates are classical for processes on a compact set but 
rather unusual for real valued ones. 
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Theorem 2.5. Assume that a = and that H = 5^ with h £ (0,1). Then the process 
(X, Y) driven by the infinitesimal generator L defined in ([7]) satisfies 

jE (x , y) (\X t - Y t \) ^ -(1 + h)E {x>y) (\X t - Y t \ 2 ) (8) 
for any x, y G R. In particular, for any t ^ and Xq, Yq ^ 0, we have 

mix y.\x e (\Xq-Yq\) (q) 

E{lXt - Ytl) ^l + (l + h)E(\X -Y \)f (9) 
Open questions and further remarks 

The inequality (JSj) should provide a better bound than Q. As pointed out in Lemma [4.21 
one can actually expect an exponential rate, but this remains an open problem. One may 
also ask for a version involving W p for any p ^ 1 or even the total variation distance. 

Beyond the TCP window size dynamics, one may ask about the speed of convergence of 
ergodic PDMPs, for which necessary and sufficient ergodicity criteria are already known, 
see e.g. |3j. One may also study the long time behavior of interacting processes associated 
to ([1]) or (|13|) . for instance Mac Kean-Vlasov mean field interactions as in [8]. 



3 Embedded chain 

It is shown in [5j Proposition 8], by Laplace transform inversion, that if H is a Dirac mass 
at point S £ (0,1), the invariant measure of the embedded chain v = v§ has Lebesgue 
density 

* re !(i - ^) h mzi 11 - ^\ xe ■ (10) 

It is unimodal, of order 0(x exp(— 5 2 x 2 /2)) when x — > oo, and all its derivatives vanish at 
x = 0. 

If H is not a Dirac mass, the invariant measure v of the embedded Markov chain is no 
longer explicit. Nevertheless, the recursion formula (@]) provides the following result, see 
[Sl[7j, which establish the existence of an invariant measure with sub-Gaussian tails. 

Theorem 3.1 (Convergence of the embedded chain, [6l[7j). Given any Xq, the embedded 
Markov chain X = (X n ) n ^Q associated to the dynamics ([T]) converges in distribution to 
the law of the random variable 

/ oo \ 1/2 

which is a.s. finite, where E\,E2,... and Qi,Q2, ■ ■ ■ are independent sequences of i.i.d. 
random variables following respectively the exponential law of unit mean and the law H 
which appear in ([1]). In particular, v is the unique invariant law of X and 



J e sx \(dx)=E 



1 



I^ =1 {l-2sQ\...Q^ 
which is finite if 2sq 2 < 1 and infinite if 2sq 2 > 1, where q = inf {x, P(Q > x) = 1} ^ 1. 
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Let us now turn to our quantitative estimate for the convergence to equilibrium for 
the embedded chain. 



Proof of Theorem \2.1[ It is sufficient to provide a good coupling. Let x ^ and y ^ 
be two non-negative real numbers, and let (E n ) n>1 and (Q n ) n >i be two independent 
sequences of i.i.d. random variables with respective laws the exponential law of unit mean 
and the law H which appears in ([1]). Let X and Y be the discrete time Markov chains on 
[0, oo) defined by 



Xq = x and X n+ \ 
Y Q = y and Y n+1 -- 



Qn+iV XI + 2E n+ i for any n ^ 
Q n+ X\lY% +2E n+ i for any n ^ 0. 



By the analogue of ([3|) for (fT3|) . the law of X (respectively Y) is the law of the embedded 
chain of a process generated by (pQ) and starting from x (respectively y). Now, for any 
p ^ 1, since x i— ► \/ x 2 + a is a 1-Lipschitz function on [0, oo) for any a ^ 0, we get 



E 



X 



n+l 



n+l 



A straightforward recurrence leads to 



Xl + 2E n+1 - JY* + 2E n+1 



E 



<E(g?)>-i, 



This gives the desired inequality when the initial laws are Dirac masses. The general case 
follows by integrating this inequality with respect to couplings of the initial laws. □ 

Let us now investigate some properties of the kernel K defined by ([5]) that will be used 
to provide concentration bounds for the ergodic theorem. The key point is that K n and 
v satisfy a Gross (or logarithmic Sobolev) inequality. 

Definition 3.2 (Gross inequality). A law r\ on M. d satisfies a Gross (or logarithmic Sobolev 
lUlSj) inequality with constant c > when for any smooth compactly supported f : M. d — > M, 

/ 2 log(/ 2 )d?7- J fdrjlogj fdrj^c j \Vf\ 2 dn. 

We denote by Gross(?7) G (0, oo] the smallest constant for which this holds true. 

If F ■ rj is the image of rj by F then GROSS (F ■ rj) < Gross(t/)||F||l ip < The Gross 
inequality contains an information on Gaussian concentration of measure: the function 

2 

x i — > e ax is ?7-integrable as soon as a < l/GROSS^). Moreover, if rj has covariance S with 
spectral radius /?(£) then 2p(S) ^ Gross(t?) and equality is achieved when rj is Gaussian. 
Furthermore, for any a-Lipschitz function / : R — * M and any A > 0, 



(11) 



as soon as C ^ Gross(?7). This means that n satisfies a sub-Gaussian concentration of 
measure for Lipschitz functions. For more details, see e.g. [20] and references therein. 
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Theorem 3.3 (Properties of the kernel of the embedded chain). Let X be the embedded 
chain associated to §Q) with transition kernel ©. Assume that H is the Dirac mass at 
point 5 G [0, 1). If f is a 1 - Lip schitz function from [0,+oo) to R, then x \— > K{f)(x) is a 
5 -Lipschitz function from [0, +oo) to R. Moreover, for any x ^ 0, the law K(-)(x) satisfies 
a Gross inequality with constant 25 2 . 

Proof. If 5 = 0, then K is the Dirac mass at and the result is trivial. For any smooth 
function / : [0, oo) — * R, we have from dSJ) that 



\{Kf)'\ 



K 



yjx 2 + 2E 



r 



+ 2E 



^5K(\f'\). (12) 

Let us show now that for every x ^ the law K(x, •) = C{X\ \ Xq = x) satisfies a Gross 
inequality with constant 25 . Since E is exponential of mean 1, the law n of ^/E/2 is a 
X-distribution with probability density and cumulative distribution functions given by 



9 ■ v 



Ave 



-2u 2 



L {'«>0} 



and G : v i— > (1 — e" 



On the other hand, 2E = U\ + C/| where U x , U 2 are i.i.d. standard Gaussians, and thus 



r 2E^- 
2 



ul + ul 



Also, 7] is the image of the Gaussian law jV(0, 12) on R 2 by a (l/2)-Lipschitz function, and 
this implies that n satisfies a Gross inequality with constant 1/2. Moreover, 



K{f){x) 



f(5Vx 2 + 2u 



I 



f(2Sv 
f(2Sv) 



Ave 



-2v z 



-x 2 /2 

9{v) 



e- u du 

{v>x/2} dv 
^{v>x/2} 



dv. 



1 - G(xj2 i 

Thus, K(-)(x) is just the image law by the Lipschitz map v 1— > 25v of the law 77 conditioned 
on (x/2,+oo). This conditional law is in turn the image of n by the function 



t ^ G^iGix) + (1 - G{x))G{t)) = G-\l - exp(-t 2 - x 2 )) = Vx 2 + t 2 . 

This function is 1-Lipschitz for any x ^ 0. Consequently, by using twice the stability 
of Gross inequalities by Lipschitz maps, we obtain that for every x > 0, the law K(x, •) 
satisfies a Gross inequality with constant (25) 2 /2 = 25 2 . □ 

Remark 3.4. When 5 = 0, the embedded chain is the constant Markov chain equal to 
0. Moreover, the chain (Z. 



nJn>0 



defined by Z n = X T - is also quite simple to study. 
Indeed, the random variables (Z n ) n>1 are i.i.d. and have the law v of \[2E. The previous 
proof ensures that v satisfies a Gross inequality with constant 2. One of the most useful 
properties of Gross inequality is the tensorization property: Gross(r]® n ) ^ Gross(??) for 
every n 1, see e.g. JH Chapter 1]. Using now the concentration property, one has, for 
any 1-Lipschitz function and any u ^ 0, 



\ k=l J 



dv 



^ u ) ^ 2 exp 



Nu< 
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In the more general case where S is positive, (X n ) n>1 is no longer i.i.d. Nevertheless, 
the Gross inequality holds true for the iterated kernels and for the invariant law v: 

Corollary 3.5 (Gross inequality for the embedded chain and its invariant law v). Let X be 

the embedded chain associated to ([I]). Assume that H is the Dirac mass at point 5 £ (0, 1). 
For every n ^ 0, let K n be the iterated transition kernel of X, defined recursively for every 
bounded measurable function f : [0, oo) — > R by 

K°(f) = f and K n+l (f)=K(K n (f)) 

where K is the kernel of X as in ([5]). Then for every integer n ^ 1 and every real x ^ 0, 
the iterated kernel K n (x,-) of X satisfies a Gross inequality and 

GROSs(K n (x,-)) ^2,5 2 — 4 r . 

1 — o z 

Also, the invariant law v of X (see theorem \3.1\) satisfies a Gross inequality and 

Gross(i/) < 25 2 {l - 5 2 )- 1 . 
Proof. Recall that for every n ^ 0, x ^ 0, and bounded measurable / : [0, oo) — > R, 

E(/(!„) \X = x)= {K n f){x) = J™f(y)K n (x,dy) 

To show that K n satisfies a Gross inequality, we use a semi-group decomposition technique 
borrowed from [13] . For any n ^ 1 and any smooth function / : [0, oo) — ► M, the quantity 

E n (f) :=K n (f log f 2 ) -K n (f 2 )\ogK n {f 2 ) 

is equal to the telescopic sum 

n 

£ {K l [K n -\f 2 ) log K n -\f 2 )} - IC- 1 [K n -* +1 (f 2 ) log K n -* +1 {f 2 )] }. 

i=l 

Since the measure K{-)(x) satisfies a Gross inequality of constant 2<5 2 , we get, with g n -i = 

n n 

Klf) =Y, Ki ' 1 \ E l(9n-i)] ^ 2 < 5 2 ^iT(|V 5ri _ i | 2 ), 

i=l i=l 

Now, by using the commutation (|12|) . we obtain, for all 1 ^ i ^ n, 

\VK n ~\f 2 )\ 2 {K\VK^ l (f)\) 2 

Next, the Cauchy-Schwarz inequality 
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^ K 



\ =K(\Vg rM \ 2 y 



gives 

i* 

4KK n ^ i ^ 1 (f 2 ) ^"l AK^-^f 2 

From these bounds, a straightforward induction gives 
Consequently, by putting all together, we have 

n— 1 -. e2n 

^ n (/ 2 )^2J 2 ^5 2i K n (|V/| 2 ) =2J 2 T ^^(|V/|' 
i=0 

This gives 

GROSs(^ n ) < 25 2 (l - 5 2n )(l - 8 2 )- 1 . 
Finally, from Theorem 13 .1\ K n tends weakly to v as n tends to infinity and thus 

GrossH < limsupGROSs(K n ) < 2(5 2 (1 - 5 2 )~ l . 



□ 



The Gross inequality for K can also be used to derive Theorem YT. 



Proof of Theorem \2.SX We shall establish that for any u ^ and any 1-Lipschitz function 

/: [0,oo)-R, 



'(^itf&k)- j fdv>u + j^W x {5 x ,vyj ^exp(- 



n(l-5 2 )u 2 
25 2 



and the desired result follows immediately from this bound used for / and — /. For any 
1-Lipschitz function /, any r > and A > 0, we have, 



n 

> fc=l 



Now the Markov property ensures that 

E^E^t = E(e A ^ 1 1 /(^) E ^ e A/(x„) |Xn _ i 

= E f e A ^ 1 1 /(X fc ) K / e A/\ (Xni 



From Theorem 13.31 the kernel •) of X satisfies a Gross inequality with constant 25 2 
for every x ^ 0. This inequality implies by (jllj) that for any c-Lipschitz function g, 

c 2 5 2 \ 2 * 
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Consequently, the Laplace transform of the ergodic mean can be bounded as follows: 



k-n-2 



The commutation relation ()12p ensures that / + Kf is (1 + o~)-Lipschitz and then 
l( e *(/+*/X*"-i)|! n _ 2 ) < exp A 1 + ^ 2 ^ 2 Va(K/+^/)(X„- 2 ) 

A straightforward recurrence ensures that 

E ( e A££ =1 /(**)) < exp ( 2( f! A J 2) ) e£g=1 ^ /(a:) - 
Choosing r = (l/n) X^fc=i K k f{x) + u leads to 

The right hand side is minimum for A = u(5~ 2 — 1). At this point, we recall the dual 
formulation of W\{a, /3) for every probability laws a and /?: 



W 1 (a,f3)= sup ( fda-fd/3) where ||/|| Li = sup 



1/0*0 



Therefore, by using Theorem 12.11 one gets 

n I n ^-^ l — o 

fc=i J k=l 



□ 



Remark 3.6. A careful reading of the proof of Theorem \2.2\ suggests that one may replace 
the initial law 5 X by a more general initial law provided that it satisfies a sub- Gaussian 
concentration for Lipschitz functions (fTT|) . 



4 Continuous time process 

As an introduction of our coupling method to prove Theorem 12.31 let us consider the 
following simpler dynamics, studied recently in |12t 117]. The window size is modeled by 
a Markov process X = (Xt)t^o that increases linearly with rate one. Congestion signals 
arrive according to a Poisson process with constant rate A > 0, and upon receipt of the 
fcth s ig na i ; tfoe window size is reduced by multiplication with a random variable Qk- We 
assume that (Qk)k^o 1S a sequence of i.i.d. random variables of law H with support in 
[0, 1). In other words, the process X is generated by 

L(f)(x) = f'(x) + \[\f(hx) - /Or)) H(dh) (13) 
J o 
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where A is this time a positive real number. In this framework, one can compute explicitly 
the transient moments of Xf (see |12|. I17j): for every n ^ 0, every x ^ 0, and every t ^ 0, 

where for every real or integer p ^ 1 the quantity # p is as in our Theorem 14. 11 In contrast 
with the original dynamics ([1]), the jump rate is constant and thus the jumps occur at 
Poissonian times. In this framework, we derive easily the following theorem, which states 
an exponential ergodicity in all Wasserstein distances. 

Theorem 4.1 (Wasserstein Exponential Ergodicity for constant jump rate). Let X = 
(Xt)t^o and Y = {Ytjt^o be two processes generated by l\l'6\) . Assume that C{Xq) and 
C{Yq) have finite p th moment for some real p ^ 1. // one defines 9 P = A(l — ~E(Q P )) with 
Q ~ H then for every t ^ 0, 

W p (C(X t ),C(Y t )) ^ W p (C(X ),C(Y ))e-P' l9 - t . 

We ignore if the exponential rate of convergence in Theorem 14.11 is optimal. One may 
try to get an upper bound from the moments formula (|14p . 

Proof of Theorem \4-l\ Let N = {Nt)t^o be a Poisson process with constant intensity A 
and Q = (Qk)k>l be i.i.d. random variables with law H, independent of N. For any 
x, y ^ 0, let us consider the processes X = (X t )t^o and Y = (Y t )t^o starting respectively 
at x and y at time 0, that jump when N does, with a multiplicative factor for the 
k th jump, and increase linearly with slope one between these jumps. It is quite clear 
that these processes are generated by (fT3|) . Moreover, between jumps, \X t — Y t \ remains 
constant and at the k th jump this quantity is multiplied by Qk- Thus for every t ^ and 

oo 

E(\X t - Y t f) = £E(|X t - Y t \n {Nt=k} ) 

k=0 

oo 

= \x-y\ P Y j nQ P ) k nNt = k) 

k=0 

As a consequence, if X = (X t )t^o and Y = (Y t )t^o are now two processes generated by 
(|13p with a constant jump intensity A and arbitrary initial laws, we obtain that, for any 
coupling IT of their initial law C(Xq) and C(Yq), any t ^ 0, and any p ^ 1, 

W p (C(X t ),£(Y t )) p ^e~ e ^ [ \x-y\ p U(d(x,y)). 

J[0,oo) 2 

Taking the infimum over II concludes the proof. □ 

Let us now turn to the generalized TCP window size process generated by the in- 
finitesimal generator ([2]). Consider a two dimensional process where both components 
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are generated by ([2]). Since the sample paths of both components have slope 1 between 
jumps, the distance between them remains constant except at jump times. If the com- 
ponents jump together with the same factor Q, then this distance is also multiplied by 
Q. Thus, our strategy is to encourage simultaneous jumps: let us introduce the Markov 
process ((X t ,Y t )) t ^ on [0, oo) 2 defined by its infinitesimal generator 

Lf{x, y) =dif(x, y) + d 2 f(x, y) 

+ {x~y) [\f(hx,y)-f(x,y))H(dh) 
Jo 

+ {y + o) ! (f(hx,hy)-f(x,y))H(dh) 



if x ^ y (if y < x one has to exchange the variables x and y). 

Choosing a test function / of the form f(x,y) = g(x) or f(x,y) = g(y) shows that X 
and Y are both Markov processes with infinitesimal generator L. 

The dynamics of (X, Y) is as follows: if (Xq, Yq) = (x, y) with for example x y, then 

• the first jump time T has density t w (i + t)e~ f2 '/ 2 ~ xt t ( ,+oo)(A) > 

• on the event {T = t} we have {X a , Y s ) = (x + s,y + s) for s < t and 



(X t ,Y t ) 



x + t y + t\ y + t + a 

with probability 



2 ' 2 I " x+t+a 



x ~\~ t \ x — y 

, y + t ) with probability 



2 J ' x+t+a 

4.1 The modified TCP process 

The first part of this section is dedicated to the proof of Theorem [ 



Proof of Theorem \2.3[ We have to study the function a: t ^ ~E( x ^(\Xt — Yt\) where 
(X, Y) evolves according to the generator L. Assume that x > y, then 



4^(0) ={x - y) j Q (\hx -y\-\x- y\) H(dh) + (y + a)(x - y) £ (h - 1) H(dh) 
= -{x-y) [ \ {hx>y} (l - h)(x + y + a) H(dh) 



(x - y) J t {hx<y} {{l + h)(x-y) + (l- h)a) H(dh) 



^-a(x-y) / (l-h)H{dh). 
Jo 

The Markov property ensures that 

«(x',j/)(0 < <»«l«(a>,v)(*)> 

where K\ = 1 — J 1 hH(dh). This obviously implies that 

E (x>y) (\X t -Y t \) < \ x -y\e- a ^. 
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The end of the proof is straightforward. If X = (Xt)t^o and Y = (lt)^o are two processes 
generated by ([I]) and if II is a coupling of C(Xq) and C(Yq), we hava,for every t ^ 0, 



Wx{L{x t ),c{Y t )) < / E(|^-y i ||Xo = x,y = y)n(dx,dy) 

JfO.oo 1 ) 2 

/ k - y| II(dx,d?/). 

JfO.oo) 2 



'[0,oo) 2 



'[0,oo) 2 

Taking the infimum over II provides the result. □ 

Let us now turn to the proof of Theorem 12.41 
Proof of theorem \2.4\ The function / defined by f{x) = x for every x satisfies to 

Lf(x) = 1 — K\x{x + a) ^ 1 — Kix 2 
where «i = 1 — f^hH^h) 6 (0, 1]. Now, for every x ^ and f ^ 0, 

Q x (t) :=E(X t |X = x) 

= <Xr(0) + / a^.(s)ds 

J 



= x+ f E((Lf)(X s )\X =x)ds 
Jo 

< x + / (1 - kiE(X s 2 |X = x)) da. 

JO 



'o 

Also, since — «i is negative, we obtain, by using Jensen's inequality, 
a' x (t) = 1 - KiE(X t 2 |X = x) < 1 - K X a x (tf. 
As a consequence, a x ^ where X is the solution of the Riccati differential equation 

f/?x(0) = x, 

\/3 x (t) = 1 - Ki/3 x (t) 2 for t > 
Denoting d = y/Ki, one gets, for x ^ 1/d, 

1 2(x - l/d)e- M * _ 1 dxcosh(dt) + sinh(dt) 1 



d (dx + 1) - (dx - l)e" 2d ' d dx sinh(dt) + cosh(dt) ^ dtanh(dt) ' 
and therefore 

1 

sup a x i < — — — — . 
x>l/d dtanh(di) 

On the other hand, we have also sup^^ a x (t) ^ 1/d, and thus for every t > 0, 

sup a (t) < 1 
x j»o X ^ dtanh(di) 
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Consider now two processes (X t )t^o and {Ytjt^o generated by ([I]) with arbitrary initial 
laws. For any s > 0, E(|X S — Y s \) ^ 2sup x a x (s) and therefore the upper bound above 
gives 

Together with Theorem 12.31 this gives the following uniform estimate, for every t ^ s > 0, 
W x {C(X t ),L{Y t )) ^ VF 1 (£(X s ),£(y s ))e- aKl (*- s ) 

o anis 
< ze e -o«it 

dtanh(ds) 

□ 

4.2 The real TCP process 

We end by giving the proof of Theorem 12.51 and making some comments on it. 
Proof of Theorem \2.b\ We start the proof as in Theorem | 



a '(x,y)(°) 



-(1 — h)(x + y)(x — y) if hx > y, 
-(1 + h)(x — y) 2 ifhx^y. 



The first bound is better. Nevertheless, if is the set {(x,y), hy ^ x ^ one has 

to notice that the process (X,Y) cannot exit D^. Then, thanks to Markov property, we 
get the following bound: 

j t H\Xt - Y t \) < -(1 + h)E(\X t - Y t \ 

Jensen's inequality ensures that 

d 
dt 

and thus, for any t ^ 0, 



E(\X t - Y t \) < -(1 + h){E(\X t - Y t \)} 2 , 



E(\X t -Y t \) < 



E(|X - Y Q \ 



l + {l + h)E(\X -Y \)t 

□ 



Figure [T] suggests that the convergence rate given by Theorem 12.51 is far from being 
satisfactory. The non-optimality of the coupling is clear. However, even with such a 
coupling, one could expect an explicit exponential upper bound. Let us denote D t = 
\X t - Y t \ where (X t , Y t ) is defined in Theorem EE . We think that E(D 2 ) is in fact of the 
order of E(Dt) (instead of E(Dt) 2 ). Indeed, with a rate of order Dt a nonsimultaneous 
jump occurs at time t and then Dt is again of order one. In the following lemma, we 
introduce a simple Markov chain which captures the essential feature the dynamics of Dt 
(division by 2 with probability 1 — O(Dt)) and we show that the expected position at time 
n goes to zeros exponentially fast as n goes to infinity. Additionally the recursive equation 
(USD plays the role of ©. 
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Figure 1: Here (x,y) = (2,1) and H = 61/2- This picture presents the three following 
functions of time: t \-> W\(C(Xf), C(Xf)) where X u is driven by (HJ) with Xq = u (blue 
curve), 1 1 — ^ E(\X? V - Y t x ' y \) where (X x >v,Y x *) is driven by {7} with (Xq' v ,Y*> v ) = (x,y) 
(red curve), t 1— ► (x — J/)/(l + 1.5(x — y)i) the upper bound ([9]) of Theorem 12.51 (green 
curve). The first and second curves have been obtained by Monte-Carlo simulations. 



Lemma 4.2. Consider the homogeneous irreducible Markov chain X = (X n ) n>0 with 
state space E = {2~\ i e N} such that, for any n ^ and x € E, on the event {X n = x} 



X n -> 



{1 with probability x/2, 
x/2 with probability 1 — x/2. 

Denote by E 1 (X n ) the quantity E(X n \Xo = 1). Then, for any n ^ 1, 

^(Xn+i) = EHXn) - \e}{XI) (15) 

and there exists a constant c > such that for any n ^ 1, 

E 1 (X n ) ^ exp (—era). (16) 

Proof. The Markov chain X is transient (and converges to 0) since 

00 

p := P(Vn > 0, X n+1 = X n /2\X = 1) = [J(l - 2 _i ) > 0. 

i=l 

Since for any n ^ 0, 

E(X n+ i|X n ) = 2"^ n f ^ ~~ 2^™) ' 

we get (|15|) . In particular, E 1 (Xi) = 3/4 and n t—* E 1 (X n ) is decreasing. Similarly, 
E 1 ^) = ^E 1 (X„_i) + Wx^OL - ^„-i)) > ^E 1 (X„_i) > ^E x (X n ). 
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As a consequence, for any n 1 



W(X n+1 ) ^ -E (X n ). 



and (|16j) follows since for any n ^ 1 



) 



n 



□ 
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